Fast Estimation of the Pattern Frequency Spectrum

نویسندگان

  • Matthijs van Leeuwen
  • Antti Ukkonen
چکیده

Both exact and approximate counting of the number of frequent patterns for a given frequency threshold are hard problems. Still, having even coarse prior estimates of the number of patterns is useful, as these can be used to appropriately set the threshold and avoid waiting endlessly for an unmanageable number of patterns. Moreover, we argue that the number of patterns for different thresholds is an interesting summary statistic of the data: the pattern frequency spectrum. To enable fast estimation of the number of frequent patterns, we adapt the classical algorithm by Knuth for estimating the size of a search tree. Although the method is known to be theoretically suboptimal, we demonstrate that in practice it not only produces very accurate estimates, but is also very efficient. Moreover, we introduce a small variation that can be used to estimate the number of patterns under constraints for which the Apriori property does not hold. The empirical evaluation shows that this approach obtains good estimates for closed itemsets. Finally, we show how the method, together with isotonic regression, can be used to quickly and accurately estimate the frequency pattern spectrum: the curve that shows the number of patterns for every possible value of the frequency threshold. Comparing such a spectrum to one that was constructed using a random data model immediately reveals whether the dataset contains any structure of interest.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

9 Power Spectrum and Correlation

he power spectrum reveals the existence, or the absence, of repetitive patterns and correlation structures in a signal process. These structural patterns are important in a wide range of applications such as data forecasting, signal coding, signal detection, radar, pattern recognition, and decision-making systems. The most common method of spectral estimation is based on the fast Fourier transf...

متن کامل

The Changes of Leg Musclus Activities Following Increase of Gait Velocity

Purpose: Motor control evaluation and analysis of it"s specifications for diagnosis of neuromuscular diseases is new approach in clinical electroneurophysiology, that is based on the changes of electromyography responses and classic reflexes in this field. In this study quantitative power spectrum frequency used for changes of motor control strategies. Materials and Methods: Twenty five health...

متن کامل

Estimation of kinematic source parameters and frequency independent shear wave quality factor around Bushehr

In this paper, the shear wave quality factor and source parameters in the near field are estimated by analyzing the acceleration data in Zagros region. Accelerograms recorded by Building and Houses Research Center strong ground motion network have been used. The data have been considered with the magnitude of 4.7 to 6.3 collected from 1999 to 2014. In this approach, the theoretical S-wave displ...

متن کامل

Large-scale Inversion of Magnetic Data Using Golub-Kahan Bidiagonalization with Truncated Generalized Cross Validation for Regularization Parameter Estimation

In this paper a fast method for large-scale sparse inversion of magnetic data is considered. The L1-norm stabilizer is used to generate models with sharp and distinct interfaces. To deal with the non-linearity introduced by the L1-norm, a model-space iteratively reweighted least squares algorithm is used. The original model matrix is factorized using the Golub-Kahan bidiagonalization that proje...

متن کامل

The Effects of Changing Footstrike Pattern on the Amplitude and Frequency Spectrum of Ground Reaction Forces During Running in Individuals With Pronated Feet

Background: The current study aimed to evaluate the effects of barefoot and shod running with two different styles on ground reaction force-frequency content in recreational runners with low arched feet. Methods: The statistical sample of this research was 13 males with PF (mean±SD age: 26.2±2.8 y; height: 176.1±8.4 cm; weight: 78.3±14.3 kg). A force plate (Bertec, USA) with a sample rate of 1...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014